NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Analyzing 16,193 LLM Papers for Fun and Profits

Xia, Zhiqiu; Zhu, Lang; Li, Bingzhe; Chen, Feng; Li, Qiannan; Liao, Chunhua; Wang, Feiyi; Liu, Hang (June 2025, IEEE CloudSummit 2025)

Free, publicly-accessible full text available June 26, 2026
Analyzing 16,193 LLM Papers for Fun and Profits

Xia, Zhiqiu; Zhu, Lang; Li, Bingzhe; Chen, Feng; Li, Qiannan; Liao, Chunhua; Wang, Feiyi; Liu, Hang (June 2025, IEEE CloudSummit 2025)

Free, publicly-accessible full text available June 26, 2026
Reductive Analysis with Compiler-Guided Large Language Models for Input-Centric Code Optimizations

https://doi.org/10.1145/3729282

Wang, Xiangwei; Hui, Xinning; Liao, Chunhua; Shen, Xipeng (June 2025, Proceedings of the ACM on Programming Languages)

Input-centric program optimization aims to optimize code by considering the relations between program inputs and program behaviors. Despite its promise, a long-standing barrier for its adoption is the difficulty of automatically identifying critical features of complex inputs. This paper introduces a novel technique,reductive analysis through compiler-guided Large Language Models (LLMs), to solve the problem through a synergy between compilers and LLMs. It uses a reductive approach to overcome the scalability and other limitations of LLMs in program code analysis. The solution, for the first time, automates the identification of critical input features without heavy instrumentation or profiling, cutting the time needed for input identification by 44× (or 450× for local LLMs), reduced from 9.6 hours to 13 minutes (with remote LLMs) or 77 seconds (with local LLMs) on average, making input characterization possible to be integrated into the workflow of program compilations. Optimizations on those identified input features show similar or even better results than those identified by previous profiling-based methods, leading to optimizations that yield 92.6% accuracy in selecting the appropriate adaptive OpenMP parallelization decisions, and 20–30% performance improvement of serverless computing while reducing resource usage by 50–60%.
more » « less
Free, publicly-accessible full text available June 10, 2026
Generating and Analyzing Program Call Graphs using Ontology

https://doi.org/10.1109/ProTools56701.2022.00008

Dorta, Ethan; Yan, Yonghong; Liao, Chunhua (November 2022, 2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools))

Call graph or caller-callee relationships have been used for various kinds of static program analysis, performance analysis and profiling, and for program safety or security analysis such as detecting anomalies of program execution or code injection attacks. However, different tools generate call graphs in different formats, which prevents efficient reuse of call graph results. In this paper, we present an approach of using ontology and resource description framework (RDF) to create knowledge graphs for specifying call graphs to facilitate the construction of full-fledged and complex call graphs of computer programs, realizing more interoperable and scalable program analyses than conventional approaches. We create a formal ontology-based specification of call graph information to capture concepts and properties of both static and dynamic call graphs so different tools can collaboratively contribute to more comprehensive analysis results. Our experiments show that ontology enables merging of call graphs generated from different tools and flexible queries using a standard query interface.
more » « less
Full Text Available
Modeling optimization of stencil computations via domain-level properties

https://doi.org/10.1145/3528425.3529103

Nesterenko, Brandon; Yi, Qing; Lin, Pei-Hung; Liao, Chunhua; Runnels, Brandon (April 2022, Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores)

Stencil computations are widely used in the scientific simulation domain, and their performance is critical to the overall efficiency of many large-scale numerical applications. Many optimization techniques, most of them varying strategies of tiling and parallelization, exist to systematically enhance the efficiency of stencil computations. However, the effective- ness of these optimizations vary significantly depending on the wide range of properties demonstrated by the different stencils. This paper studies several well-known optimization strategies for stencils and presents a new approach to effectively guide the composition of these optimizations, by modeling their interactions with four domain-level proper- ties of stencils: spatial dimensionality, temporal order, order of accuracy, and directional dependence. When using our prediction model to guide optimizations for five real-world stencil applications, we were able to identify optimization strategies that outperformed two highly optimized stencil libraries by an average of 2.4x.
more » « less
Full Text Available
RDS: a cloud-based metaservice for detecting data races in parallel programs

https://doi.org/10.1145/3468737.3494089

Shi, Yaying; Wang, Anjia; Yan, Yonghong; Liao, Chunhua (December 2021, UCC '21: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing)

Data races are notorious concurrency bugs which can cause severe problems, including random crashes and corrupted execution results. However, existing data race detection tools are still challenging for users to use. It takes a significant amount of effort for users to install, configure and properly use a tool. A single tool often cannot find all the bugs in a program. Requiring users to use multiple tools is often impracticable and not productive because of the differences in tool interfaces and report formats. In this paper, we present a cloud-based, service-oriented design and implementation of a race detection service (RDS)1 to detect data races in parallel programs. RDS integrates multiple data race detection tools into a single cloud-based service via a REST API. It defines a standard JSON format to represent data race detection results, facilitating producing user-friendly reports, aggregating output of multiple tools, as well as being easily processed by other tools. RDS also defines a set of policies for aggregating outputs from multiple tools. RDS significantly simplifies the workflow of using data race detection tools and improves the report quality and productivity of performing race detection for parallel programs. Our evaluation shows that RDS can deliver more accurate results with much less effort from users, when compared with the traditional way of using any individual tools. Using four selected tools and DataRaceBench, RDS improves the Adjusted F-1 scores by 8.8% and 12.6% over the best and the average scores, respectively. For the NAS Parallel Benchmark, RDS improves 35% of the adjusted accuracy compared to the average of the tools. Our work studies a new approach of composing software tools for parallel computing via a service-oriented architecture. The same approach and framework can be used to create metaservice for compilers, performance tools, auto-tuning tools, and so on.
more » « less
Full Text Available
CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming

https://doi.org/10.1109/IPDPSW52791.2021.00068

Yi, Xinyao; Stokes, David; Yan, Yonghong; Liao, Chunhua (June 2021, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW))

Programming to achieve high performance for NVIDIA GPUs using CUDA has been known to be challenging. A GPU has hundreds or thousands of cores that a program must exhibit sufficient parallelism to achieve maximum GPU utilization. A system with GPU accelerators has a heterogeneous and deep memory system that programmers must effectively and correctly use to fully take advantage of the GPU's parallelism capability. In this paper, we present CUDAMicroBench, a collection of fourteen microbenchmarks that demonstrate performance challenges in CUDA programming and techniques to optimize the CUDA programs to address these challenges. It also includes examples and techniques for using advanced CUDA features such as data shuffling between threads, dynamic parallelism, etc that can help users optimize the CUDA program for performance. The microbenchmark can be used for evaluating the performance of GPU architectures, the memory systems of GPU itself and of the whole system architectures, and for evaluating the effectiveness of compiler and performance tools for performance analysis. It can be used to help users understand the complexity of heterogeneous GPU-accelerator systems through examples and guide users for performance optimization. It is released as BSD-licensed open-source from https://github.com/passlab/CUDAMicroBench.git.
more » « less
Full Text Available
Deep NLP-Based Co-Evolvement for Synthesizing Code Analysis from Natural Language

https://doi.org/10.1145/3446804.3446852

Nan, Zifan; Guan, Hui; Shen, Xipeng; Liao, Chunhua (February 2021, Compiler Construction)
null (Ed.)
Full Text Available
FreeCompilerCamp.org: Training for OpenMP Compiler Development from Cloud

Wang, Anjia; Mishra, Alok; Liao, Chunhua; Yan, Yonghong; Chapman, Barbara (January 2020, Journal of computational science education)

OpenMP is one of the most popular programming models to exploit node-level parallelism of supercomputers. Many researchers are interested in developing OpenMP compilers or extending existing standard for new capabilities. However, there is a lack of training resources for researchers who are involved in the compiler and language development around OpenMP, making learning curve in this area steep. In this paper, we introduce an ongoing effort, FreeCompilerCamp.org, a free and open online learning platform aimed to train researchers to quickly develop OpenMP compilers. The platform is built on top of Play-With-Docker, a docker playground for users to conduct experiments in an online terminal sandbox. It provides a live training website that is set up on cloud, so anyone with internet access and a web browser will be able to take the training. It also enables developers with relevant skills to contribute new tutorials. The entire training system is open-source and can be deployed on a private server, workstation or even laptop for personal use. We have created some initial tutorials to train users to learn how to extend the Clang/LLVM and ROSE compiler to support new OpenMP features. We welcome anyone to try out our system, give us feedback, contribute new training courses, or enhance the training platform to make it an effective learning resource for the HPC community.
more » « less
Full Text Available
XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory

https://doi.org/10.1109/ACCESS.2022.3196008

Xu, Hailu; Lin, Pei-Hung; Emani, Murali; Hu, Liting; Liao, Chunhua (January 2022, IEEE Access)

Search for: All records